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1 Introduction 



We consider the processing of the data produced by the two infrared imaging photometers 
onboard the ESA Herschel satellite PQ, namely PACS [2] and SPIRE [3]. These data 
are affected by several impairments and producing high quality images from the raw 
instrument output is a difficult task. In fact, in addition to the ubiquitous thermal noise, 
these data are also affected by offsets, saturations, pointing errors, glitches and drifts. 
As a result, the data are normally reduced by means of a pipeline composed of several 
steps, where each step takes care of a specific impairment and tries to remove it from 
the data and to produce a set of clean, updated data to be used as input for the next 
step. Tipically the last step receives data where all the impairments except the noise have 
been removed. The task of the last step is the production of a sky image (map) from the 
noisy data, a process called map making, which can be implemented using several well 
established methods, e.g. [I]. 

One of the impairments affecting the Herschel data is a time varying deviation from 
the baseline. This deviation, which is usually termed a drift, is typically slowly varying 
with respect to the signal and the noise but is normally much larger than the noise and 
often comparable to or larger than the signal component. Therefore it is mandatory to 
remove the drift before the map making step. This report is devoted to presenting a drift 
removal method suitable for use with Herschel data. Specifically, we assume that the drift 
can be represented as a polynomial and develop a Subspace Least Square (SLS) approach 
to its estimation, using concepts and techniques from linear algebra. Subspace analysis is 
a well known technique and it was used in the past do perform drift estimation, e.g. [3], 
but, to the best of our knowledge, the specific approach proposed here is novel. Moreover 
subpace based drift estimation seems to have not been applied to Herschel data before. 
The method presented here is employed in the Unimap map makeiQ [6]. 

While our main application is the reduction of Herschel data, we note that the drift 
can be found also in other types of data, for example in fMRI data |5J. Moreover the 
approach presented in this report is not limited to a polynomial drift, but can handle 
also other drift models: it suffices that the drift can be modeled as a linear combination 
of given waveforms and that the data is a redundant, linear observation. Therefore the 
dedrifting method presented here could be exploited for other types of data too. For this 
reason the presentation is divided in three parts. The first part, section [21 is devoted to a 
brief summary of basic facts from linear algebra that are used in the report. In the second 
part, section [3j we introduce the SLS method, without any reference to a specific data 
type. In the third part, section HJ we specialise the method to the case of the polynomial 
drift affecting the Herschel data. 

^^More precisely, Unimap exploits a smart, iterative implementation of the method presented here. 
This implementation will be described in a future work. 
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2 Preliminaries 



2.1 Notation 

We use uppercase letters to denote matrices, lowercase letters to denote vectors and 
scalars. A superscript T denotes matrix or vector transposition. We say that a matrix 
with N rows and M columns is a N x M matrix. We use / to denote the identity matrix. 
We use E{x} to denote the expected value of x. 

2.2 Basic facts from linear algebra 

Let us summarise some basic facts which will be useful and can be found in any textbook 
covering linear algebra, e.g. [?]. We will work in the vector space dt N of the N x 1 real 
column vectors but the results are easily extended to the complex case. In this space, the 
inner product of two vectors x and y is the scalar product x T y, the two vectors are said 
orthogonal if x T y = and a set of vectors v- t for i = 1 , . . . , n is said to be an independent 
set if Ya a i v i = only when the scalars a« are all zero. 

Given a vector space V, a subspace E C V is any subset of V that is closed under vector 
sum and multipication by a scalar, i.e. such that if x G E and j/GS then (ax + by) G E. 
Note that a subspace is a vector space. 

Given a space V, a set of its vectors B = {vi, v n } is said a basis if it is an indepen- 
dent set and if any vector x of V can be obtained as a linear combination of the vectors of 
B, i.e. it can be written as x = J2i a i v i f° r some scalars a^. The dimension of a subspace 
is the number of vectors in one of its basis. 

The intersection of two subspaces E and A, denoted by E PI A, is a subspace and is 
constituted by all the vectors belonging to both E and A. If E D A = (the null vector) 
the subspaces are said disjoint. 

The sum of two subspaces E and A, denoted by T = E + A, is a subspace and is 
constituted by all the vectors that can be written as x = s + d where s G E and d G A. If 
E and A are disjoint they are said to be complementary in T and T is said the direct sum 
of E and A, denoted by T = E © A. In this case a vector in T can be written as x = s + d 
in a unique way (i.e. if also x = w + h where w G E and h G A then w = s and h = d). 

Consider T = E © A and one of its vector x = s + d. The vector s G E is said the 
projection of x in E along A and the vector d G A is said the projection of x in A along 
E. It exists a matrix II such that s = Ux and d = (I — Il)a;. The matrix IT is termed 
the projector into E along A and the matrix (/ — II) the projector into A along E. All 
the projectors are idempotent, i.e. II 2 = II. Conversely all the idempontent matrices are 
projectors into some subspace. 

Given a subspace E of a space T the set of all the vectors in T that are orthogonal 
to all the vectors of E is a subspace termed the orthogonal complement of E in T and is 
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denoted by E . Note that r = E © E , i.e. T can be regarded as the direct sum of any 
of its subspace and the corresponding orthogonal complement. Therefore a vector of T 
can be uniquely written as x = s + s 1 - where s G E and s G E -1 . In this case s is the 
orthogonal projection of x into E and s 1 - is the orthogonal projection of x into E^. 

Consider a set of n vectors M = {vi, ...,v n }. The set of all the vectors that can 
be obtained as a linear combination of the % i.e. all vectors than can be written as 
x — J2i diVi for some scalars subspace and is said the span of M. 

Given an n x m matrix X the span of its columns is called the range of X and is 
a subspace of Ji™ denoted as Ex- Note that a vector y in the range can be written as 
y = Xa where a is a m x 1 vector. Another important subspace is the null of the matrix, 
which is a subset of Ji m , constituted by all the vectors a G 9ft m such that Xa = 0. 

Consider an n x m matrix X such that its columns are an independent set. Consider 
the range Ex and the orthogonal complement E x . Then any vector x G 9ft n can be 
decomposed as x = s + s 1 - where s G Ex and s 1 - G E x . The orthogonal projection of x 
into E, namely the vector s, is given by s = Tlxx where Ilx is a n x n matrix given by 
Ilx = X(X T X)~ 1 X T which is termed the orthogonal projector into Ex- Similarly, the 
orthogonal projection of x into E -1 , namely the vector s -1 , is given by s 1 - = Hj^x where 
n x = I — U x . The null of Ilx is E x and the null of II X is Ex- Any orthogonal projector 
is idempotent and symmetrical, i.e. Ilx = n x . 

3 Subspace least square drift estimation 
3.1 Data model and problem statement 

Consider a signal vector m G 9ft A/ which is observed with a linear instrument to produce a 
vector of iV >> M observed data s = Pm where P is an iV x M matrix representing the 
linear instrument. Assume that the observations are affected by two additive distrubances. 
The first is a random N x 1 vector n representing thermal noise. The second is a N x 1 
vector y representing a drift in the data. In the following we assume that the drift can 
be expressed as y = Xa where X is a iV x K known matrix, with K « N and a 
is an unknown K x 1 vector. In words the condition requires that the drift is a linear 
combination of the columns of X. Then the noisy observation, denoted by d, is a vector 
written as 

d = Pm + Xa + n = s + y + n. (1) 

We consider the problem of producing an estimate y* of the drift vector y. Then the 
drift estimate can be subtracted from the data vector to produce an updated data vector 
d = d — y* where, ideally, the drift has been removed. Such an updated data vector can 
next be used to estimate the signal m. In the model of (pQ) we assume that the matrices P 
and X are known, that the vectors m and a are unknown deterministic vectors and that 
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n is a random process. We also assume that P and X are full rank, because this simplifies 
the presentation and is true for our main application, i.e. Herschel data, but it would 
not be difficult to remove such hypothesis. Also, we make the development assuming real 
numbers, but the extension to complex numbers is not difficult. 

3.2 Noiseless case analysis 

As a preliminary step we consider how to produce a drift estimate y* when the noise is 
absent, i.e. we set n — in (Q]). This case is simpler to analyse but allows to introduce 
all the tools that we need in the solution of the noisy case. 

One problem in the drift estimation is the presence of the signal component s in the 
data. In fact the presence of the signal will bias the drift estimate. In order to solve this 
problem we consider the span of the matrix P, which is a subspace £ p that will be termed 
the signal subspace, and the corresponding projector, which is lip = P(P T P)~ 1 P T . We 
also consider the orthogonal complement Ep, which will be termed the nosignal subspace, 
and the corresponding projector lip = (I — Up). Next we note that s = Pm lies in the 
signal subspace so that IlpS = 0. Then we can get rid of the signal by projecting d into 
the nosignal subspace. In fact the projection is z = Upd and, since d — s + y, we obtain 

z = nj,y 

showing that the signal has been removed. 

Now we note that the drift vector belongs to the subspace spanned by the columns 
of X, which will be denoted by £x and termed the drift subspace. Then an estimate of 
the drift can be obtained by selecting a vector y* G T,x that, after being projected into 
the nosignal subspace, is equal to z. Since any vector v in Ex can be written as v = Xa, 
after the projection the vector will be in the form IlpXa = Wa, where we introduced the 
N x K matrix W = IlpX. Then we can look for y* by solving the following equation in 
the variable vector a 

Wa = z. (2) 

Once we have a solution a*, the drift estimate is y* = Xa*. We now proceed to better 
discuss the solution of the last equation. 

Since we assumed that the noise is absent, the last equation has at least one solution, 
because z is indeed the projection of a drift vector into the nosignal subspace. However if 
Ex fl Sp 7^ 0, there will be more than one solution. In fact, suppose that a* is a solution 
and y* = Xa* is the corresponding drift estimate, such that Hpy* = z. Now consider 
any vector v G Sx (iSp. Since v G Ex, it can be written as v — Xb for some vector b. 
Moreover, since v G Sp, we have Ilpf = 0. Then it is not difficult to show that a* + b is 
also a solution. In fact 

W(a* + b) = Wa* + TljiXb = Z + Uj 3 v = z. 
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The last reasoning shows that the general solution of the system is in the form a* + b 
where a* is a particular solution and b is such that v = Xb is in the signal subspace. We 
now proceed to discuss a method to find a particular solution of (jSJ). 

As we have seen equation (T5j) has at least one solution. This means that z lies in 
the subspace spanned by the columns of the matrix W, which will be denoted by T*w 
We have also seen that, when S^ fl Sp 7^ 0, the equation has infinitely many solutions. 
This means that z can be expressed in more than one way as a linear combination of the 
columns of W implying that the columns of W are a dependent set. We now present a 
procedure to remove columns from the matrix W to construct a matrix W such that the 
columns of W still span Svp and are an independent set. In other words the columns of 
W are a basis for Ejy. The procedure also produces a partition of T, x into the direct sum 
of two subspaces, denoted by S^ and S^ that will be useful later. 

The procedure is started by initialising k = 0, X^ = A and X^ to a void matrix. 
Next the following steps are iterated. 

1. Check if S^ (fe) n S P = 0. If true set X = X^ k \ X = XW and stop. 

2. Identify a non zero vector v £ £j£(*o H Sp. 

3. Since v £ T x(k) we have v = J2ih x i where the X{ are the columns of X^ and the foj 
are appropriate coeffiecients not all equal to zero. 

4. Select a non zero coefficient. Suppose it is the j-th one. 

5. Set X( k+1 ^ equal to X^ with the j-th column removed. 

6. Set X^ 1 ) equal to X^ k ' with the vector v appended as last column. 

7. Let k — k + 1. Go to step 1. 

As a preliminary comment, note that the procedure can be repeated no more than K 
times, because if it is repeated K times the matrix X^ will be a void matrix, the test 
in step 1 will be positive and the procedure will stop. In practice the procedure will be 
repeated a times where a is the dimension of the subspace Ex n Sp. 

The output of the procedure is constituted by the N x (K — a) matrix X and by the 
N x a matrix X. The span of X is a subspace that will be termed the drift-nosignal 
subspace. The span of A is a subspace that will be termed the drift-signal subspace. 
Furthermore we can consider the N x (K — a) matrix W = UpX. These matrices and 
spaces are important in the development, because of the features summarised in the 
following lemmas which are proved in the appendix. 

Lemma 1. The columns of the matrix W = UpX are an independent set. 

Lemma 2. The columns of the matrix W = lip A are in the span of the columns of 
W, i.e. they are all obtainable as linear combinations of the columns of W. 

Lemma 3. We have T, x = © £jj , D Sp = and = T, x fl Sp. 

We now discuss how to find a particular solution of system (T5]). To this end consider 
the following, reduced system in the variable (K — a) x 1 vector a 

Wa = z (3) 
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and note that, since the columns of W are a subset of those of W, a solution a* of the 
reduced system immediately brings a solution of the full system of ([2]). The full solution 
a* can be obtained by completing a* with some zero coefficients, placed in correspondence 
of the columns of W that are missing in W. Then, in order to find a particular solution 
of the full system, we need to solve the reduced system. 

The reduced system surely has a solution because z is in the span of W due to Lemma 
2. Moreover, it is not difficult to verify that the reduced system only has one solution, 
because fl Sp = 0. Since the reduced system has only one solution, the solution is 
also the unique solution of the associated system of normal equations [7] 

W T Wa = W T z (4) 

and since, from Lemma 1, we know that the columns of W are an independent set, 
implying that the matrix (W /T V4 / ) is non singular, the solution is 

a* = [W T W)- l W T z. 

Let us now develop an expression for the drift estimate. Since we have an expression 
for a solution of the reduced system, we can construct a solution, a*, of the full system by 
completing with zero coefficients and compute the drift estimate as y* = Xa*. However 
it is immediate to verify that y* can be obtained directly from the solution of the reduced 
system as 

y* = Xa*. 

Moreover, by replacing a*, z = Upd and W = UpX and by using the fact that lip is 
symmetric and idempotent, we can eventually write the drift estimate as 

y* = X{X T Il^X)- l X T Il^d = Hd (5) 

where we introduced the matrix H = X(X T TlpX)~ l X T Tlp-. The latter expression gives 
a drift estimate that depends only on the data vector d and on the matrices P and X. 

3.3 Subspace Least Square drift estimation 

Let us now consider the drift estimation when the noise is present. As before, we can get 
rid of the signal by projecting the data vector in the nosignal subspace. Therefore, as 
before, we compute z = Upd. 

Again a meaningful drift estimate would be obtained by selecting a vector in that, 
after being projected into the nosignal subspace, is equal to z. To this end we should look 
for a vector a such that 

Wa = z. 
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However, since z is now a noisy vector, in general an exact solution to the preceeding 
overdetermined system does not exist. An obvious way out is to find a least square 
solution, i.e. to look for a vector a that minimizes \Wa — z\ 2 . However the least square 
solution is not unique, because, again, it is not dificult to show that given a solution a* 
and a vector v = Xh G Ex fl Ep then also (a* + b) is a solution. Like already done in 
the noiseless case, since the span of W and W is the same, this problem can be solved by 
replacing the original minimization with the minimization of iH^a— z\ 2 , with the advantage 
that the latter minimization only has one solution, which can be obtained as the solution 
of the associated system of normal equations. The normal equations are identical to those 
of the noiseless case given by @ so we find that the optimum a is still 

a* = [W T W)- l W T z 

and obtain a corresponding drift estimate, y* = Xa*, which is identical to fl5]), namely 

y* = Ed. 

Then the latter expression gives the drift estimate also in the noisy case and will be 
referred as the Subspace Least Square (SLS) drift estimate. 

Finally, by subtracting y* from d an updated data vector is obtained as 

d = d- Hd= (I - H)d. (6) 

In the next section we better characterize the updated data vector and the drift estimate. 



3.4 Analysis of the SLS estimate 

In order to better characterize the SLS estimate we need to study the matrix H . To this 
end we first note that, as is easy to check, H 2 = H so that H is idempotent. Therefore 
if is a projector. We now proceed to make a convenient partition of 3t N . Specifically 
let us consider the subspace which is obtained by the sum of Sp and E^, denoted by 
Xp, = Sp + S^. Since, from Lemma 3, SpflS-^ = this is in fact a direct sum, i.e. 
Sp = Sp ©S^. Next let us introduce the orthogonal complement of Ep, denoted by E^. 
Since ?R, N = Ep © E^ we get the following decompositon for ?R, N 

9^ = Ep©E^©E^ 

so that every vector v in §l N can be uniquely written as 

v = v P + Vx + v D ± (7) 

where Vp G Ep, G E^ and v D ± G Spx. 
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We now study the multiplication of the matrix H with a vector taken from one of the 
three subspaces just introduced. We firstly consider a vector v G E^-. This vector can be 
written as v = Xb for some vector b so that 

Hv = X(X T Tlj>X)- 1 X T Tl^Xb = Xb = v «6E|. (8) 

Now consider a vector v G Ep, which can be written t> = Pb for some vector 6. We have 

Hv = XiX^X^X^Pb = v G Ep (9) 

where we used the fact that UpP = 0. Finally consider a vector t> G E^. This vector is 
orthogonal to all the vectors of Ep, therefore it is in Ep so that IIpt> = v. This vector is 
also orthogonal to all the vectors of E^ so that X T v = 0. Then 

Hv = Xi^U^X^X^^v = X(X T Ii^X)~ l X T v = v G E^. (10) 

The last three equations show that H is the projector into E^ along Ep©E^. Conversely, 
(/ — if) is the projector into Ep © E^ along E^. 

We are now ready to discuss the drift estimate and updated data vector. The drift 
estimate is y* = Hd where d is given by (II]). Let us now express the noise, drift and signal 
components using the writing of (I7j). The noise can be written as n = rip + + n D ±. 
The signal lies in the signal subspace, so that s = sp. Concerning the the drift y, note 
that, based on Lemma 3, it can be written as y = y^ +y x where y-% G E^- and y x G T, x . 
However, from the same Lemma, we also have y x G Ep. Therefore the drift can be 
written as y = y x + yp. Now, using these writings and equations (|E1 EH [10]), it is easy to 
check that 

y* = y x + n x . 

The last equation shows that the SLS estimate is the drift component falling into the drift- 
nosignal subspace while the component falling into Ep is not detected. The estimate is 
affected by a random error given by n x . Note that E{n x } = E{Hn} = if the noise is 
zero mean. 

Let us now discuss the updated data vector which, from ((6]), can be written as 

d = s + y + n 

where s — (I — H)s is the updated signal, y — (I — H)y is the updated drift and 
n = (I — H)n is the updated noise. Since s G Ep it will not be modified by (/ — H) so 
that 

s = s 

telling us that the signal part is unmodifed in the updated data vector. Now consider the 
drift. Using y = y x + yp we get 

V = Vp 
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telling us that the drift component falling into the signal subspace will leak into the 
updated data vector. Then the best case is when the intersection of the signal and drift 
subspaces only contains the zero vector, because in this case y = and the drift is entirelly 
removed. More generally the drift estimation will be good and the approach useful as long 
as most of the drift energy falls outside the signal subspace. 
Finally, using n = n P + n-^ +n D ±, the updated noise is 

h = np + n D ±. 

Note that the updated data vector will be passed to a noise removal algorithm and the lat- 
ter equation shows that the noise component falling in the signal subspace is not changed 
in the updated data vector. This means that the performance of the noise removal algo- 
rithm will not be affected by the drift removal. In fact any good noise removal method 
should not be affected by the noise falling outside the signal subspace. Also, for the 
updated noise we have 

E{n} = E{(I - H)n} = (I - H)E{n} = 0, 

showing that if the original noise is zero mean the same is true for the updated noise, and 

R h = E{hn T } = (I - H)E{nn T }(I - H) T = (I - H)R n (I - H) T , 

showing the correlation matrix of the updated noise can be computed from the correlation 
matrix of the original noise. 

3.5 Direct and Iterative SLS 

Let us briefly discuss some implementation issues. The updated data vector is obtained 
by subtracting the drift estimate from the original data vector. Then we only discuss the 
computation of the drift estimate. 

The drift estimate can be computed directly, using (Q. This requires performing a 
mutliplication with the matrix H = X{X T IipX)~ l X T Iip. The multiplication can be 
carried out in successive steps: we firstly multiply the vector by lip; then we multiply 
the result by X T \ and so on. The most difficult step is the multiplication by (A^IlpX) -1 
since this requires producing the inverse of (X T UpX), which may not be feasible if the 
dimension of the matrix is large. 

An alternative way of producing the drift estimate is using an iterative method. For 
example, we can solve the normal equations of (HI) using the Parallel Conjugate Garadient 
(PCG) [8] method. The PCG method requires no matrix inversion but only to perfom 
the multiplication of a vector by the matrix W T or W, which is a much simpler task. 

Having concluded the general presentation of the SLS approach, in the next sections 
we study how this approach can be applied to the specific case of the Herschel data. 
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4 Application to Herschel data 



4.1 Herschel data model 

In order to use the SLS approach for Herschel data, we need to cast the Herschel data 
into the model of equation (pQ) and derive the matrices P and X. To this end we firstly 
discuss the data acquisition process. 

The PACS and SPIRE instruments onboard the Herschel Satellite are imaging pho- 
tometers made by arrays of bolometers, measuring the power emitted in several infrared 
bands. The number of bolometers in the array will be denoted by iV& and varies depending 
on the instrument and on the observation band. 

The arrays observe a field of view in the sky, covering an area of about 6 square 
arcmin. However a typical Herschel observation covers a larger sky area, which may be 
some square degrees wide. To observe the area, the Herschel telescope is moved along one 
or more sets of parallel scan lines. During the scan each bolometer is sampled to produce 
a sequence of N r readouts which is termed a timeline. The set of all the iV& timelines, 
toghether with the corresponding pointing information, constitutes the observation raw 
output. By stacking the timelines, the oservation output can be represented compactly 
with a iV x 1 vector d where N = ■ N r . 

We assume that all impairments except the drift and the noise have been removed in 
prior processing steps. Then the data vector can be written asd = s + y + n where s is 
the signal, y is the drift and n is the noise. In the rest of this section we deepen the noise 
and signal models, while the drift component will be discussed in the next section. 

The noise component is typically modeled as a zero mean, stationary Gaussian process 
with a power spectrum given by the sum of two terms. The first term is white noise 
with flat spectrum N w (f) = Nq, the second term is a correlated noise with spectrum 
N c (f) = {fk/f) a No where is called the knee frequency. This term is also referred 
as the 1//- noise and dominates the spectrum at low frequencies, causing long lasting 
departures from the zero level. 

A model for the signal component is obtained by assuming that the observed sky is a 
pixelised image, i.e. assuming that the sky is partitioned into a set of M non overlapping 
squares (pixels) and that the flux is constant in each pixel. Then, by stacking all the 
pixels, the observed sky can be represented as a M x 1 vector m, which is termed a map. 
Since each readout is accompanied by a pointing information we can assign each readout 
to a pixel, i.e. each element of vector d to an element of vector m. This correspondance 
can be cast into a N x M matrix P = {pk,i}, termed the pointing matrix, such that 
Pk,i = 1 if the k-th readout falls into the i-th map pixel and pk,i = otherwise. In this 
way the signal component can be expressed as s = Pm, which fits into the model of (pQ). 
Note that P is a sparse matrix the rows of which are all zero except for one element which 
is one. 
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As a comment note that in the model just introduced we must consider some limits 
to the pixel size. Specifically the noise and drift removal algorithms need redundancy 
in order to do a good job. In practice this requires that each pixel is observed several 
times so that N » M. Therefore the pixel size shall be large enough to guarantee that 
condition, a fact which puts a lower limit to the resolution of the final image. 

As a further comment note that each timeline is affected by an unknown offset because 
the instruments' readouts are not absolute but relative. Tyipically, this offset is roughly 
removed in the first processing step, by forcing each timeline mean or median to zero. 
The fine offset compensation is carried out by the image formation algorithm (the map 
maker) which follows the drift removal. However the absolute offset cannot be estimated 
and only the relative timelines' offsets can be corrected. As a result the final image itself 
is affected by an unknown offset that has to be estimated using independent calibration 
data. 

4.2 Herschel drift model 

The drift component of a single timeline is normally well approximated by a polynomial 
of low degree, say less that five. In order to develop a model for the drift let us initially 
assume that there is only one timeline, with N r samples, which is modeled as a polynomial 
of degree N a . In this case the elements of the drift vector can be written as yi = J2k=o l x \ a k 
for i = 1, N r , where for k = 0, N a are the polynomial coefficents and Xi are real 
numbers. Since the sampling of the bolometer is done at regular times, the x^ shall be 
equispaced numbers, of the form = i5 + a. Now note that 

N a N a 
k=0 k=0 

and by developping the powers in the last expression and then grouping the powers of i 
we can write 

N a 

k=0 

where the are appropriate coefficients. The last expression shows that the values yi 
when Xi = i5 + a can also be obtained as a polynomial with coefficents b^ when Xi = i. 
Then, without loss of generality, we can assumqj 5 = 1 and a = 0, i.e. that Xi = i. We 

2 Note that, in the practical implementation of SLS, we shall select a and S in order to improve 
numerical stability. 



13 



now introduce the N r x (N a + 1) matrix X given by 

/ 1 



X 



X\ x\ 
1 x 2 



.Na \ 
1 \ 



y 2 

■ L 2 



X 3 x? 3 



V 1 x Nt X 



X 



X 2 

T Na 
x 3 



X N r J 



the (N a + 1) x 1 vector a 



f a o \ 
a i 

V a N a j 



and note that the drift can be written as 



y = Xa 

which fits the data model of ((H). We also note that X is a Vandermonde matrix so that 
its columns are linearly independent when N r > (N a + 1). 

The model just introduced is easily expanded to the case when there are timelines 
of N r samples, each affected by independent polynomial drifts, with polynomial order 
iV a . In this case the vector d is obtained by stacking the timelines and similarly can be 
done with the drift component y. Then, upon denoting the /i-th timeline drift as a vector 
y( h \ using the development of the previous paragraph, this vector can be expressed as 
y(h) — jflW where is a column vector with the drift coefficients of the h-th. timeline. 
Furthermore, by introducing a block diagonal matrix of dimension NxK, with N = N},-N r 
and K = N b ■ (N a + 1), given by 





fx 





. 


. 


\ 







X 


. 


. 




X = 








X . 


. 






lo 





. 


. X 


) 



the drift vector can be written as 



y 



Xa 



where a is a K x 1 vector obtained by stacking the vectors. 

To proceed we note that the map making algorithm employed to produce the final 
image may be capable of removing the drift too, at least to a certain extent. In fact 
the map maker is essentially a noise removal algorithm and the drift can be seen as a 
low frequency noise. In this case there is no need to separately estimate and remove the 
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drift. However we also note that the drift observed in Herschel data often has common 
components. Typically one can identify a common drift, affecting all the timelines, in 
addition each timeline's drift. Furthermore, when the array is partitioned into subarrays, 
like in PACS, a subarray drift component is present. Such common components are 
not well removed by the map maker, because the mapper normally assumes that the 
noise processes of the timelines are uncorrelated, which is not true in the presence of the 
common drift. In this case we can run the derifting algorithm to remove the correlated 
drift components only and leave to the mapper the burden of the single drift removal. 
Theferore we now proceed to generalise the drift model just introduced to handle the case 
of common drift components. 

In order to develop a model for the common drift components, let us assume that the 
timelines are divided into N g groups and that each group is affected by a polynomial drift 
with order N a . It is not difficult to construct a matrix X and a vector a suitable to model 
this situation. Without loss of generality we can assume that all the timelines of a group 
are stacked successively in the data vector. Now consider the h-th group and denote by 
gh the number of timelines in the group and by the vector of the coefficients of the 
drift. All the timelines in the group are affetcted by this drift therefore we introduce the 
vector y( h > obtained by stacking copies of the drift, one for each timeline in the group. 
Then we stack g^ copies of the matrix X to produce a matrix X^ h ' given by 



X 



so that we can write = X^a^ h ' to model the drift of the group. The complete model 
is obtained by stacking all the y^ vectors into a single N x 1 vector y which is the drift, 
with N = Nb ■ N r , by stacking all the vectors into a single K x 1 vector a, with 
K = N g ■ (N a + 1), by introducing the N x K matrix 



X 



( X® 
X® 
X® 

















(12) 



and by noting that y = Xa, which fits into the model of equation ([TJ). Note that when 
N g = Nt we are back to the case of a single drift per timeline, therefore the last model is 
the most general and the only one we need to study. 
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4.3 Projecting in the signal subspace 

Having developed a suitable model for the Herschel data, in the next sections we consider 
several practical and theoretical issues concerning the implementation of the SLS drift 
estimate for Herschel data. As a starting point we discuss the projection of the data 
vector into the nosignal subspace, i.e. the computation of z = Upd, since this is the first 
step in the method. In practice, z can be obtained as z = d — v where v is the projection 
of d into the signal subspace, v = Upd. Then it is sufficient to discuss the projection into 
the signal subspace. 

The projector into the signal subspace is Hp = P(P T P)~ 1 P T which, by introducing 
the matrix R = (P T P)- 1 P T , can be written as Hp = PR. The projection of d into the 
signal subspace can thus be written as v — PRd. Now consider the vector r = Rd. It 
is a M x 1 vector which can be regarded as an estimate of the map (in fact it is the 
LS estimate) and is termed the naive or simple projection or rebinned map. To better 
understand how the naive map is obtained note that it is not difficult to verify that the 
vector P T d is a map where the i-th element is equal to the sum of all the readouts falling 
into the i-th pixel and that (P T P) is a M x M diagonal matrix where the i-th diagonal 
element is equal to the number of readouts falling into the i-th pixel. Then it is seen 
that the i-th element of r is equal to the average of all the readouts falling into the i-th 
pixel. Now we can express the projection as v — Pr and note that this operation, termed 
the back-projection of the r map, amounts at producing the data vector that would be 
obtained if the sky was given by the vector r. In summary, the projection of a data vector 
d into the signal subspace amounts at firstly rebinning d into a naive map and next at 
back-projecting a data vector from the naive map. 

Both the projection and backprojection operations are simple to perform and can be 
efficently implemented without really producing and storing the matrix P. Therefore the 
projection into the signal and nosignal subspaces poses no implementation problems. Also 
note that if one of the diagonal elements of (P T P) is zero then the matrix is singular, but 
this means that there is an unobserved pixel in the map and this can be solved by simply 
removing that pixel from the problem. Therefore in a well posed problem the (P T P) 
matrix is non singular. 

4.4 SLS for Herschel data 

Let us discuss the drift removal capability of the SLS approach for the specific case 
of Herschel data. To this end recall that the SLS approach will not remove the drift 
components falling in Sp while the rest is perfectly removed. Therefore it is useful to 
study the intersection of the signal and the drift subspaces, spanned by the columns of 
the P and X matrices respectively. This discussion is also useful in order to implement 
the procedure described in section 13.21 where we remove columns from the matrix X to 
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produce the matrix X. 

As a first point note that a constant Nxl vector c = (1, 1, 1) T lies in both subspaces. 
In fact when X is given by (fi~2l) . the constant vector can be obtained as c = Xa by properly 
selecting the coefficent vector a, so that c lies in the drift subspace. Now consider the 
projection of the c vector into the signal subspace, namely the vector t = Upc. As we 
have seen lip = PR, i.e. it can be seen as a naive mapping followed by a back-projection. 
Then, by rebinning c we obtain r = Rc which is easily shown to be a constant map, 
r = (1, 1, 1) T . Next, it is easy to verify that by back-projecting the constant naive map 
we obtain a vector t = Pr which is the constant vector, i.e. t = c. Then c is projected 
onto itself by lip and is therefore also in the signal subspace. 

Since the constant vector is in both subspaces, it is in the intersection of the two, 
together with all its scaled versions. Whether the intersection contains other vectors or 
not depends on the particular P, i.e. on the scan strategy and the pixel size, and cannot 
be discussed in general. However, since all the vectors in Ex, are obtained by stacking 
polynomials, as soon as the redundancy is moderately high, so that several redouts fall 
into each pixel, it is highly unlikely that one of these vectors also belongs to the £p 
subspace, since it should pass unchanged through lip, i.e. through a naive mapping and 
a backprojection operation. Therefore for all practical cases we can safely assume that 
the intersection contains the constant vector and no other vectors. 

Since Xx H Sp only contains constant vectors and since all other drift components 
are correctly estimated by the SLS, neglecting the noise the drift estimate can written 
as y* = y + c where c is an unknown constant vector and y is the actual drift. The 
latter equation says that, for practical Herschel data, the drift estimate is affected by an 
unknown offset but is otherwise exact. On the other hand the presence of an offset is 
no real problem, because we have seen in section 14.11 that the production of absolutely 
calibrated maps is impossible without the use of independent calibration data. Therefore 
if the drift estimation is affected by an offset, it makes no difference. In this sense we 
can say the the SLS approach entirelly removes the drift and is ideally suited for Herschel 
data. 

Finally, let us discuss how to produce the matrix X. To this end note that £x H Sp 
only contains the constant vector and that the constant vector is obtained by combining 
with the same coefficient all and only the columns of X where the first column of X is 
replicated. Then, according to the procedure described in section 13.21 to produce X we 
can remove anyone of these columns. The simplest choice is to remove the first column 
of X, which is one of such columns. 
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4.5 Drift removal with Direct and Iterative SLS 



The direct estimation of the drift can be obtained using (jSJ) and requires to compute 
Hd. Since H = X{X T TlpX)~ l X T Tlp the multiplication can be carried out in successive 
steps, without really computing and storing the H matrix, which would be unfeasible for 
Herschel data. The first step is to mutliply d by lip, which can be efficently implemented 
with naive mapping and back-projection, as we have seen in section H~3l Then we multiply 
the result, which is a N x 1 vector, by X T and note that the matrix X is obtained from X 
of (|12[) by removing one column, as we have seen in section |4~41 Then it is a N x (K — 1) 
sparse, block matrix, constructed from the X matrix. Then X does not really need 
to be produced and stored: the multplication can be efficently carried out storing only 
X, with a computational complexity similar to naive mapping. The result of the last 
step is a (K — 1) x 1 vector which needs to be multiplied with the (K — 1) x {K — 1) 
matrix (X T IlpX) _1 . This is the most delicate step and will be discussed in the next 
paragraph. It produces a (K — 1) x 1 vector containing the drift coefficients. The last 
step is the multplication of this vector by X. This amounts at the synthesis of the drift 
of each timeline which again can be efficently implemented using the X matrix and has a 
complexity similar to naive mapping. 

Let us now discuss the multiplication by (X T IIpX) -1 . We found no clever ways to 
implement this step and need to use the brute force approach. That is we produce the 
matrix (X T UpX) and compute and store its inverse. The production of the matrix can 
be efficently realised and is normally feasible. The result is a (K — 1) x (K — 1) matrix 
where K = N g ■ (N a + 1) which we need to invert. Whether this is feasible depends on K. 
As we mentioned the drift is well modeled by a low order polynomial, say less than five, 
then, in practice, K depends on the number of groups N g . For example suppose we want 
estimate a drift for each timeline, so that N g = iVj,, in a PACS blue observation using a 
polynomial order of 3. Since N b = 2048 for PACS blue we have K = 2048 -4 = 8192 
and the matrix is too big to be inverted. On the other hand if we only want to remove 
the subarray drift, since PACS has 8 subarrays we only need to compute the drift for 
N g = 8 groups and K = 8 • 4 = 32 which yields a matrix that can easily be inverted. 
In practice the use of direct SLS is limited to the removal of the common drifts. The 
iterative approach based on the PCG needs to be used when K is too high for the direct 
approach. 

4.6 Projecting in the drift subspace 

To conclude, let us discuss the implementation of the projection into the drift subspace, 
Tlx — X(X T X)~ 1 X T . Note that this projection is not needed in the context of the SLS as 
presented thus far. However it will be useful for a future, planned development, therefore 
we discuss it here. 
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The projection Uxd can be implemented in steps. The first step is multiplying d by 
X T and we have already seen that this step can be implemented efficently. Similarly the 
last step, i.e. the multiplication by X, is simple and has been already discussed. The only 
step worth discussion is the multiplication by (X T X)~ l which is discussed in the next 
paragraph. 

As a preliminary comment recall that X is a non singular, Vandermonde matrix so 
that (X T X)~ l exists. This is a (N a + 1) x (N a + 1) matrix that can be computed and 
stored. Now consider the problem of computing the inverse of (X T X). To this end note 
that, from (JHJ) we have (X^) T X^ = g h X T X. Then, from (02) we have 



X T X 



( 9l X T X 
g 2 X T X 











g,x T x 











Qn^X J 



showing that X T X is a block diagonal matrix, with N g blocks and a block dimension of 
(iV a + 1). Such a matrix is easy to invert. Indeed we have 



(x T xy 1 



91 










!)■> 




















.</■■! 











(x T xy 

9N„ 



\ 



J 



Then we see that the matrix has a simple structure. It does not need to be really produced 
and the multiplication by it can be efficently implemented by using only [X T X)~ 1 . 



5 Appendix 

Proof of Lemma 1. Note that, by construction, Y, x D Sp = 0. Now suppose that the 
columns of W are a dependent set. Then, denoting the columns with Wi, we can find 
coefficients hi not all zero such that 

biWi = 0. 

i 

Moreover, = HpXi where Xi is the i-th column of X. Since ffl 1 = Sp © Sp, we can 
decompose each column as x-i = xf + xf where xf G Sp and xf G £p\ The, replacing in 
the last equation we get 

£ b^x, = bilLUxf + 4) = E b i4 = o. 

i i i 



19 



where we used the fact that Hpxf = and that Hpxf- = xf. Now consider the vector v 
that is obtained as a linear combination of the Xi with the coefficients b iy i.e. 

v = Y b i x i- 

i 

Obviously v G E^ and v ^ because the coefficients are not all zero. Moreover note that 
v = Y hxi = Y H x f + x t)=Y hx? + Y hxi = Y b i x i- 

i i iii 

The last equation shows that v is obtained as the sum of vectors in the signal subspace 
and therefore v G Ep. Then v G E^ H Ep which contraddicts the fact that E^ fl Ep = 0. 

o 

Proof of Lemma 2. Since W = Ilp-X^ for some k, we can prove the lemma if 
we prove that, for any k, the columns of W are in the span of Ilp-X^. This is trivially 
true for k — 0, because = X. Then we can prove the thesis if we show that the 
columns of UpX^ are in the span of the columns of UpX^ k+1 \ To this end denote the 
columns of HpX^ by Wi = HpXi, where the Xj are the columns of X^ k \ Recall that, in 
the procedure, we look for a vector v which is in S^ (fe) n Ep, so that, using to denote 
the columns of P, 

V = Y a iPi = b * X i 
i i 

and remove from X^ a column Xj such that bj ^ 0. Then we have 

V = Y a iPi = b i X i + b 3 X r 
i i^j 

Multiplying the last expression by lip we obtain 

= Y, billpXi + bjUpXj 

which yields 

Wj = -j-Y b i w i 

showing that Wj is in the span of the Wi for i ^ j. Since the columns of TlpX^ are the 
Wi they are in the span of the columns of ITpX( fc+1 ) which are the Wi for i ^ j. o 

Proof of Lemma 3. Let us show that the columns of X^ togehter with the columns 
of X^ are a basis for E x , for any k. Since this is trivially true for k = 0, we can prove 
the thesis by showing that, if it is true at step k, it is true also at step k + 1. To this end 
assume that the columns of X^ togehter with the columns of X^ are a basis for Ex- 
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Note that at the successive step, all the columns will be unchanged except that the j-th 
column of X, namely Xj, is replaced by v. But v was selected in the span of X i.e. 

v = Y^ biXi = hxi + bjXj 

i i^j 

so that 

°3 i 

Then all the vectors that could be obtained as a linear combination of the vectors xi 
for all % (the columns of X) can also be obtained as a linear combination of the vectors 
Xi for i 7^ j and v. This implies that the span of (XW,XW) is identical to that of 
(X( fe+1 ),X( fe+1 )). In turn this implies that the columns of (X^ k+1 \ X^ k+1 ^) are a basis for 

Now we can prove the lemma. Based on the last paragraph, we know that the columns 
of (X, X) are a basis for £x so that £x = ^ x ® ^x- Moreover, by construction, we know 
that Y, x fl Sp = and that S x C S p . Then we only have to show that X x = Xx nSp. 
Since S x C S p we only have to prove that all the vectors of Ex fl £p are also in H x . To 
this end consider a vector i> e Xx HSp. Since it is in Ex we can write 

v = v x + Vx = J2 hxi + J2 °i h i 

i i 

where the Xj are the columns of X and the hi are the columns of X. Now we note 
that v x £ Sx flEp (because the columns of X are both in Sx and in Ep). Then also 
(u — i>x) e HSp (because it is a subspace). Moreover 

v ~ v x = Y b^i 

i 

so that (v — Vx) G £ x . From the last equations we have (v — v x ) £ £x and (f — fx) £ 
and since S^flSp = 0we have (y — v x ) = 0. This implies that v x = and that 
f = v x £ S x . o 
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